Data Input

Let’s fit our hierarhical model for counts of chocolate chips. The data can be found in cookies.dat.

dat = read.table(file="data_files/cookies.dat", header=TRUE)
head(dat)
##   chips location
## 1    12        1
## 2    12        1
## 3     6        1
## 4    13        1
## 5    12        1
## 6    12        1
table(dat$location)
## 
##  1  2  3  4  5 
## 30 30 30 30 30
hist(dat$chips)

boxplot(chips~location, data=dat)

Prior predictive checks

Before implementing the model, we need to select prior distributions for \(\alpha\) and \(\beta\), the hyperparameters governing the gamma distribution for the \(\lambda\) parameters. First, think about what the \(\lambda\)’s represent. For location \(j\), \(\lambda j\) is the expected number of chocolate chips per cookie. Hence, \(\alpha\) and \(\beta\) control the distribution of these means between locations. The mean of this gamma distributidon will represent the overall mean of number of chips for all cookies. The variance of this gamma distribution controls the variability between locations. If this is high, the mean number of chips will vary widely from location to location. If it is small, the mean number of chips will be nearly the same from location to location.

To see the effects of different priors on the distribution of \(\lambda\)’s, we can simulate. Suppose we try independent exponential priors for \(\alpha\) and \(\beta\).

set.seed(112)
n_sim = 500
alpha_pri = rexp(n_sim, rate=1.0/2.0)
beta_pri = rexp(n_sim, rate=5.0)
mu_pri = alpha_pri/beta_pri
sig_pri = sqrt(alpha_pri/beta_pri^2)

summary(mu_pri)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.021    2.983    9.852   61.127   29.980 4858.786
summary(sig_pri)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.1834    3.3663    8.5488   41.8137   22.2219 2865.6461

After simulating from the priors for \(\alpha\) and \(\beta\), we can use those samples to simulate further down the hierarchy:

lambda_pri = rgamma(n=n_sim, shape=alpha_pri, rate=beta_pri)
summary(lambda_pri)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.000     1.171     7.667    83.062    28.621 11005.331

Or for a prior predictive reconstruction of the original data set:

(lambda_pri = rgamma(n=5, shape=alpha_pri[1:5], rate=beta_pri[1:5]))
## [1] 66.444084  9.946688  6.028319 15.922568 47.978587
(y_pri = rpois(n=150, lambda=rep(lambda_pri, each=30)))
##   [1] 63 58 64 63 70 62 61 48 71 73 70 77 66 60 72 77 69 62 66 71 49 80 66 75 74
##  [26] 55 62 90 65 57 12  9  7 10 12 10 11  7 14 13  9  6  6 13  7 10 12  9  9 10
##  [51]  7  8  6  9  7 10 13 13  8 12  6 10  3  6  7  4  6  7  5  5  4  3  6  2  8
##  [76]  4  8  4  5  7  1  4  5  3  8  8  3  1  7  3 16 14 13 17 17 12 13 13 16 16
## [101] 15 14 11 10 13 17 16 19 16 17 15 16  7 17 21 16 12 15 14 13 52 44 51 46 39
## [126] 40 40 44 46 59 45 49 58 42 31 52 43 47 53 41 48 57 35 60 51 58 36 34 41 59

Because these priors have high variance and are somewhat noninformative, they produce unrealistic predictive distributions. Still, enough data would overwhelm the prior, resulting in useful posterior distributions. Alternatively, we could tweak and simulate from these prior distributions until they adequately represent our prior beliefs. Yet another approach would be to re-parameterize the gamma prior, which we’ll demonstrate as we fit the model.

JAGS Model

library("rjags")
## Loading required package: coda
## Linked to JAGS 4.3.2
## Loaded modules: basemod,bugs
mod_string = " model {
for (i in 1:length(chips)) {
  chips[i] ~ dpois(lambda[location[i]])
}

for (j in 1:max(location)) {
  lambda[j] ~ dgamma(alpha, beta)
}

alpha = mu^2 / sig^2
beta = mu / sig^2

mu ~ dgamma(2.0, 1.0/5.0)
sig ~ dexp(1.0)

} "

set.seed(113)

data_jags = as.list(dat)

params = c("lambda", "mu", "sig")

mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 150
##    Unobserved stochastic nodes: 7
##    Total graph size: 315
## 
## Initializing model
update(mod, 1e3)

mod_sim = coda.samples(model=mod,
                       variable.names=params,
                       n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim))

## convergence diagnostics
plot(mod_sim)

gelman.diag(mod_sim)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## lambda[1]          1          1
## lambda[2]          1          1
## lambda[3]          1          1
## lambda[4]          1          1
## lambda[5]          1          1
## mu                 1          1
## sig                1          1
## 
## Multivariate psrf
## 
## 1
autocorr.diag(mod_sim)
##           lambda[1]    lambda[2]   lambda[3]    lambda[4]    lambda[5]
## Lag 0   1.000000000 1.0000000000 1.000000000  1.000000000  1.000000000
## Lag 1   0.018328857 0.1114479354 0.015545767  0.017872357  0.063704594
## Lag 5  -0.005690929 0.0022229186 0.008108424  0.009424213 -0.011090218
## Lag 10 -0.003906936 0.0065247469 0.007608765 -0.003531401 -0.017027506
## Lag 50  0.003734427 0.0006202909 0.017983974  0.010908444 -0.003620628
##                 mu           sig
## Lag 0  1.000000000  1.0000000000
## Lag 1  0.371078734  0.5814369757
## Lag 5  0.036552705  0.1140075583
## Lag 10 0.021757453 -0.0009952316
## Lag 50 0.002956074 -0.0209861221
autocorr.plot(mod_sim)

effectiveSize(mod_sim)
## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5]        mu       sig 
## 13800.033 11997.757 14760.597 14672.944 12566.601  5931.104  3515.436
## compute DIC
dic = dic.samples(mod, n.iter=1e3)

Model checking

After assessing convergence, we can check the fit via residuals. With a hierarhcical model, there are now two levels of residuals: the observation level and the location mean level. To simplify, we’ll look at the residuals associated with the posterior means of the parameters.

First, we have observation residuals, based on the estimates of location means.

## observation level residuals
(pm_params = colMeans(mod_csim))
## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5]        mu       sig 
##  9.279502  6.217697  9.528141  8.943501 11.761976  9.088803  2.091156
yhat = rep(pm_params[1:5], each=30)
resid = dat$chips - yhat
plot(resid)

plot(jitter(yhat), resid)

var(resid[yhat<7])
## [1] 6.447126
var(resid[yhat>11])
## [1] 13.72414

Also, we can look at how the location means differ from the overall mean \(\mu\).

## location level residuals
lambda_resid = pm_params[1:5] - pm_params["mu"]
plot(lambda_resid)
abline(h=0, lty=2)

We don’t see any obvious violations of our model assumptions.

Results

summary(mod_sim)
## 
## Iterations = 2001:7000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean     SD Naive SE Time-series SE
## lambda[1]  9.280 0.5377 0.004390       0.004616
## lambda[2]  6.218 0.4652 0.003798       0.004270
## lambda[3]  9.528 0.5454 0.004453       0.004490
## lambda[4]  8.944 0.5258 0.004293       0.004344
## lambda[5] 11.762 0.6213 0.005073       0.005545
## mu         9.089 0.9836 0.008031       0.012805
## sig        2.091 0.7226 0.005900       0.012370
## 
## 2. Quantiles for each variable:
## 
##             2.5%    25%    50%    75%  97.5%
## lambda[1]  8.254  8.914  9.269  9.640 10.363
## lambda[2]  5.331  5.898  6.208  6.529  7.146
## lambda[3]  8.491  9.157  9.512  9.889 10.626
## lambda[4]  7.936  8.587  8.931  9.290 10.002
## lambda[5] 10.573 11.343 11.743 12.178 13.013
## mu         7.199  8.459  9.069  9.681 11.147
## sig        1.095  1.592  1.956  2.436  3.856

Posterior predictive simulation

Just as we did with the prior distribution, we can use these posterior samples to get Monte Carlo estimates that interest us from the posterior predictive distribution.

For example, we can use draws from the posterior distribution of \(\mu\) and \(\sigma\) to simulate the posterior predictive distribution of the mean for a new location.

(n_sim = nrow(mod_csim))
## [1] 15000
lambda_pred = rgamma(n=n_sim, shape=mod_csim[,"mu"]^2/mod_csim[,"sig"]^2, 
                  rate=mod_csim[,"mu"]/mod_csim[,"sig"]^2)
hist(lambda_pred)

mean(lambda_pred > 15)
## [1] 0.01726667

Using these \(\lambda\) draws, we can go to the observation level and simulate the number of chips per cookie, which takes into account the uncertainty in \(\lambda\):

y_pred = rpois(n=n_sim, lambda=lambda_pred)
hist(y_pred)

mean(y_pred > 15)
## [1] 0.05833333
hist(dat$chips)

Finally, we could answer questions like: what is the posterior probability that the next cookie produced in Location 1 will have fewer than seven chips?

y_pred1 = rpois(n=n_sim, lambda=mod_csim[,"lambda[1]"])
hist(y_pred1)

mean(y_pred1<7)
## [1] 0.1910667

Random intercept linear model

We can extend the linear model for the Leinhardt data on infant mortality by incorporating the region variable. We’ll do this with a hierarhcical model, where each region has its own intercept.

library("car")
## Loading required package: carData
data("Leinhardt")
?Leinhardt
str(Leinhardt)
## 'data.frame':    105 obs. of  4 variables:
##  $ income: int  3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
##  $ infant: num  26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
##  $ region: Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
##  $ oil   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
pairs(Leinhardt)

head(Leinhardt)
##           income infant   region oil
## Australia   3426   26.7     Asia  no
## Austria     3350   23.7   Europe  no
## Belgium     3346   17.0   Europe  no
## Canada      4751   16.8 Americas  no
## Denmark     5029   13.5   Europe  no
## Finland     3312   10.1   Europe  no

Previously, we worked with infant mortality and income on the logarithmic scale. Recall also that we had to remove some missing data.

dat = na.omit(Leinhardt)
dat$logincome = log(dat$income)
dat$loginfant = log(dat$infant)
str(dat)
## 'data.frame':    101 obs. of  6 variables:
##  $ income   : int  3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
##  $ infant   : num  26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
##  $ region   : Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
##  $ oil      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ logincome: num  8.14 8.12 8.12 8.47 8.52 ...
##  $ loginfant: num  3.28 3.17 2.83 2.82 2.6 ...
##  - attr(*, "na.action")= 'omit' Named int [1:4] 24 83 86 91
##   ..- attr(*, "names")= chr [1:4] "Iran" "Haiti" "Laos" "Nepal"

Now we can fit the proposed model:

library("rjags")

mod_string = " model {
  for (i in 1:length(y)) {
    y[i] ~ dnorm(mu[i], prec)
    mu[i] = a[region[i]] + b[1]*log_income[i] + b[2]*is_oil[i]
  }
  
  for (j in 1:max(region)) {
    a[j] ~ dnorm(a0, prec_a)
  }
  
  a0 ~ dnorm(0.0, 1.0/1.0e6)
  prec_a ~ dgamma(1/2.0, 1*10.0/2.0)
  tau = sqrt( 1.0 / prec_a )
  
  for (j in 1:2) {
    b[j] ~ dnorm(0.0, 1.0/1.0e6)
  }
  
  prec ~ dgamma(5/2.0, 5*10.0/2.0)
  sig = sqrt( 1.0 / prec )
} "

set.seed(116)
data_jags = list(y=dat$loginfant, log_income=dat$logincome,
                  is_oil=as.numeric(dat$oil=="yes"), region=as.numeric(dat$region))
data_jags$is_oil
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
table(data_jags$is_oil, data_jags$region)
##    
##      1  2  3  4
##   0 31 20 24 18
##   1  3  2  3  0
params = c("a0", "a", "b", "sig", "tau")

mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 101
##    Unobserved stochastic nodes: 9
##    Total graph size: 622
## 
## Initializing model
update(mod, 1e3) # burn-in

mod_sim = coda.samples(model=mod,
                       variable.names=params,
                       n.iter=5e3)

mod_csim = as.mcmc(do.call(rbind, mod_sim)) # combine multiple chains

## convergence diagnostics
plot(mod_sim)

gelman.diag(mod_sim)
## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## a[1]       1.01       1.02
## a[2]       1.01       1.02
## a[3]       1.00       1.02
## a[4]       1.01       1.02
## a0         1.00       1.00
## b[1]       1.01       1.02
## b[2]       1.00       1.00
## sig        1.00       1.00
## tau        1.00       1.00
## 
## Multivariate psrf
## 
## 1.01
autocorr.diag(mod_sim)
##             a[1]      a[2]      a[3]      a[4]        a0      b[1]       b[2]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000
## Lag 1  0.9215130 0.9249983 0.9208515 0.9363881 0.2417391 0.9802633 0.13757587
## Lag 5  0.8516442 0.8540759 0.8481198 0.8654878 0.2401784 0.9063704 0.03682142
## Lag 10 0.7755400 0.7761437 0.7695167 0.7887239 0.2041248 0.8241430 0.02367128
## Lag 50 0.3843723 0.3799484 0.3762810 0.3880574 0.1091399 0.4059433 0.01766002
##                sig          tau
## Lag 0  1.000000000  1.000000000
## Lag 1  0.062517244  0.278337049
## Lag 5  0.023289479  0.017032331
## Lag 10 0.007524789 -0.004626403
## Lag 50 0.020324507  0.028045368
autocorr.plot(mod_sim)

effectiveSize(mod_sim)
##       a[1]       a[2]       a[3]       a[4]         a0       b[1]       b[2] 
##   157.5292   159.9618   160.1749   152.2201   614.8942   149.4727  5170.9158 
##        sig        tau 
## 11894.6322  7913.6889

Results

Convergence looks okay, so let’s compare this with the old model from Lesson 7 using DIC:

dic.samples(mod, n.iter=1e3)
## Mean deviance:  214.3 
## penalty 7.314 
## Penalized deviance: 221.6

It appears that this model is an improvement over the non-hierarchical one we fit earlier. Notice that the penalty term, which can be interpreted as the “effective” number of parameters, is less than the actual number of parameters (nine). There are fewer “effective” parameters because they are “sharing” information or “borrowing strength” from each other in the hierarhical structure. If we had skipped the hierarchy and fit one intercept, there would have been four parameters. If we had fit separate, independent intercepts for each region, there would have been seven parameters (which is close to what we ended up with).

Finally, let’s look at the posterior summary.

summary(mod_sim)
## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD  Naive SE Time-series SE
## a[1]  6.5837 0.56419 0.0046066      0.0448626
## a[2]  6.0464 0.70955 0.0057934      0.0559683
## a[3]  5.8781 0.63496 0.0051845      0.0500541
## a[4]  5.5827 0.87072 0.0071094      0.0702442
## a0    6.0285 1.31976 0.0107758      0.0533149
## b[1] -0.3469 0.10766 0.0008790      0.0087856
## b[2]  0.6500 0.35257 0.0028787      0.0049977
## sig   0.9199 0.06572 0.0005366      0.0006036
## tau   2.0387 1.03234 0.0084291      0.0115973
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%     50%     75%   97.5%
## a[1]  5.4286  6.2160  6.5931  6.9612  7.6709
## a[2]  4.6185  5.5801  6.0490  6.5228  7.4243
## a[3]  4.5844  5.4616  5.8892  6.3117  7.1025
## a[4]  3.8269  5.0096  5.5955  6.1670  7.2619
## a0    3.4330  5.2538  6.0297  6.8171  8.5789
## b[1] -0.5556 -0.4192 -0.3480 -0.2768 -0.1268
## b[2] -0.0409  0.4146  0.6538  0.8886  1.3293
## sig   0.8027  0.8731  0.9157  0.9606  1.0611
## tau   0.9703  1.4048  1.7834  2.3548  4.7038

In this particular model, the intercepts do not have a real interpretation because they correspond to the mean response for a country that does not produce oil and has $0 log-income per capita (which is $1 income per capita). We can interpret \(a_0\) as the overall mean intercept and \(\tau\) as the standard deviation of intercepts across regions.

Other models

We have not investigated adding interaction terms, which might be appropriate. We only considered adding hierarchy on the intercepts, but in reality nothing prevents us from doing the same for other terms in the model, such as the coefficients for income and oil. We could try any or all of these alternatives and see how the DIC changes for those models. This, together with other model checking techniques we have discussed could be used to identify your best model that you can use to make inferences and predictions.

Quiz Questions

In previous lessons, we fit models to data representing percent growth in personnel for companies in two industries. Below are additional data from the two original industries (with 10 and 6 companies, respectively), as well as 3 additional industries. Percent growth is reported for a total of 53 companies.

dat = read.table(file="data_files/pctgrowth.csv", sep=",", header=TRUE)
head(dat)
##      y grp
## 1  1.2   1
## 2  1.4   1
## 3 -0.5   1
## 4  0.3   1
## 5  0.9   1
## 6  2.3   1
table(dat$grp)
## 
##  1  2  3  4  5 
## 10  6  8 15 14
hist(dat$y)

boxplot(y~grp, data=dat)

Rather that fit 5 separate models, one for each industry, we can fit a hierarchical model. As before, we assume a normal likelihood and common variance across all observations. Each industry will have its own mean growth, and each of these means will come from a common distribution, from which we will estimate the overall mean and variability across industries.

Let \(i\) index the individual companies, and \(g_i\) indicate the industry (grp variable in pctgrowth) for company \(i\). How do we describe the hierarchical model?

\(y_i \space |\space \theta_{g_i}, \sigma^2 \space \stackrel{ind}{\sim} \space N(\theta_{g_i},\sigma^2), \space \space i=1,...,53,\space \space g_i \in \{1,...,5\}\)

\(\theta_g \space |\space \mu, \tau^2 \space \stackrel{iid}{\sim} \space N(\mu,\tau^2), \space \space g_i \in \{1,...,5\}\)

\(\mu \space {\sim} \space N(0,1e6)\)

\(\tau^2 \space {\sim} \space IG(\frac{1}{2},(1)\frac{3}{2}))\)

\(\sigma^2 \space {\sim} \space IG(\frac{2}{2},(2)\frac{1}{2}))\)

Fit the hierarchical model in JAGS and obtain posterior mean estimates for each industry’s mean growth (posterior mean for each \(\theta_g\))

mod_quiz_string = " model {
  for (i in 1:length(y)) {
    y[i] ~ dnorm(theta[grp[i]], prec)
  }
  
  for (j in 1:max(grp)) {
    theta[j] ~ dnorm(mu, tau_sq)
  }
  
  mu ~ dnorm(0, 1/1e6)
  tau_sq ~ dgamma(1.0/2.0, 1.0*3.0/2.0)
  prec ~ dgamma(2.0/2.0, 2*1/2)
  sig = sqrt(1/prec)

} "

set.seed(113)

data_jags = as.list(dat)

params = c("theta", "mu", "sig")

mod_quiz = jags.model(textConnection(mod_quiz_string), data=data_jags, n.chains=3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 53
##    Unobserved stochastic nodes: 8
##    Total graph size: 127
## 
## Initializing model
update(mod_quiz, 1e3)

mod_quiz_sim = coda.samples(model=mod_quiz,
                       variable.names=params,
                       n.iter=5e3)
mod_quiz_csim = as.mcmc(do.call(rbind, mod_quiz_sim))

## convergence diagnostics
plot(mod_quiz_sim)

gelman.diag(mod_quiz_sim)
## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## mu                1          1
## sig               1          1
## theta[1]          1          1
## theta[2]          1          1
## theta[3]          1          1
## theta[4]          1          1
## theta[5]          1          1
## 
## Multivariate psrf
## 
## 1
autocorr.diag(mod_quiz_sim)
##                 mu          sig     theta[1]     theta[2]      theta[3]
## Lag 0  1.000000000  1.000000000  1.000000000  1.000000000  1.0000000000
## Lag 1  0.058952720  0.099726250  0.044981098  0.072779363  0.0310882828
## Lag 5  0.004625128 -0.004357823 -0.005111150 -0.004268041 -0.0007112292
## Lag 10 0.011624889 -0.014136970  0.005400666  0.001406345 -0.0114713766
## Lag 50 0.021108219 -0.003905373  0.026416230  0.019192014  0.0065659741
##            theta[4]     theta[5]
## Lag 0   1.000000000  1.000000000
## Lag 1   0.023829173  0.015875327
## Lag 5   0.004887581 -0.006855590
## Lag 10 -0.001277870 -0.007358193
## Lag 50  0.006341703 -0.011927842
autocorr.plot(mod_quiz_sim)

effectiveSize(mod_quiz_sim)
##       mu      sig theta[1] theta[2] theta[3] theta[4] theta[5] 
## 13145.35 12308.36 13260.63 12005.80 13179.93 14374.73 14732.24
## compute DIC
dic = dic.samples(mod_quiz, n.iter=1e3)

pm_params = apply(mod_quiz_csim, 2, mean)
means_theta = pm_params[-c(1,2)]

We are interested in comparing these estimates to those obtained from the model that assumes no hierarchy (the ANOVA cell means model). We can approximate the posterior estimates for the 5 industry means under a noninformative prior by simply calculating the sample mean growth for the five industries. We can do this in R with:

means_anova = tapply(dat$y, INDEX=dat$grp, FUN=mean)
## dat is the data read from pctgrowth.csv

plot(means_anova)
points(means_theta, col="red") ## where means_theta are the posterior point estimates for the industry means.

The estimates from the hierarchical model have less variability than those of the ANOVA model, tending toward smaller magnitudes.

In our hierarchical model for personnel growth, we assumed that the variability between companies within an industry was constant across industries (\(\sigma^2\) was the same for all industries.) Which approach would be less informative?

Answer: Calculate the posterior probability that \(\sigma^2/\tau^2>1\) in the original model. If this probability exceeds a pre-determined amount, use a model with separate variance parameters.

Consider once again the OME data in the MASS package in R, which we explored earlier. The data consist of experimental results from tests of auditory perception in children. Under varying conditions and for multiple trials under each condition, children either correctly or incorrectly identified the source of the changing signals.

Recall that the model looked like this:

\(y_i \space | \space \phi_i \stackrel{ind}{\sim} Binomial(n_i,\phi_i), \space \space i=1,...,712\)

\(logit(\phi_i)=\beta_0+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}\)

\(\beta_0 \sim N(0,5^2)\)

\(\beta_k \stackrel{iid}{\sim} N(0,4^2)\space \space k=1,2,3.\)

As with other models, we will extend the intercept (and rename it) so that the linear part of the model looks like this:

\(logit(\phi_i)=\alpha_{(ID_i)}+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}\)

where \(ID_i\) is an index identifying the child for observation \(i\).

The hierarchical prior for the intercepts would look like this:

\(\alpha_j \stackrel{iid}{\sim} N(\mu,\tau^2),\space \space j=1,...,63\) (There are 63 children).

followed by the priors for \(\mu\) and \(\tau^2\),

\(\mu \sim N(0,10^2)\)

\(\tau^2 \sim IG(\frac{1}{2},\frac{1}{2})\)

\(\tau^2\) indicates the variability in the number of correct responses across tests for one child.

We fit the hierarchical model outlined above using JAGS:

library("MASS")
data("OME")

dat = subset(OME, OME != "N/A")
dat$OME = factor(dat$OME) # relabel OME
dat$ID = as.numeric(factor(dat$ID)) # relabel ID so there are no gaps in numbers (they now go from 1 to 63)

## Original reference model and covariate matrix
mod_glm = glm(Correct/Trials ~ Age + OME + Loud + Noise, data=dat, weights=Trials, family="binomial")
X = model.matrix(mod_glm)[,-1]

## Original model (that needs to be extended)
mod2_quiz_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dbin(phi[i], n[i])
        logit(phi[i]) = alpha[ID[i]] + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
    }
     
    for (k in 1:max(ID)) {
      alpha[k] ~ dnorm(mu, prec_alpha)
    }
    
    for (j in 1:4) {
        b[j] ~ dnorm(0.0, 1.0/4.0^2)
    }
    
    mu ~ dnorm(0.0, 1.0/10.0^2)
    prec_alpha ~ dgamma(1.0/2.0, 1.0/2.0)
    tau = 1/prec_alpha
    
    
} "

data_jags = as.list(as.data.frame(X))
data_jags$y = dat$Correct
data_jags$n = dat$Trials
data_jags$ID = dat$ID

How do the convergence diagnostics look?

params = c("alpha", "b", "mu", "tau")

mod2_quiz = jags.model(textConnection(mod2_quiz_string), data=data_jags, n.chains=3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 712
##    Unobserved stochastic nodes: 69
##    Total graph size: 6499
## 
## Initializing model
update(mod2_quiz, 1e3)

mod2_quiz_sim = coda.samples(model=mod2_quiz,
                       variable.names=params,
                       n.iter=5e3)
mod2_quiz_csim = as.mcmc(do.call(rbind, mod2_quiz_sim))

## convergence diagnostics
plot(mod2_quiz_sim)

gelman.diag(mod2_quiz_sim)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## alpha[1]        1.06       1.18
## alpha[2]        1.06       1.19
## alpha[3]        1.06       1.16
## alpha[4]        1.07       1.20
## alpha[5]        1.07       1.19
## alpha[6]        1.05       1.14
## alpha[7]        1.06       1.18
## alpha[8]        1.06       1.17
## alpha[9]        1.05       1.15
## alpha[10]       1.05       1.16
## alpha[11]       1.07       1.21
## alpha[12]       1.06       1.17
## alpha[13]       1.05       1.15
## alpha[14]       1.06       1.17
## alpha[15]       1.06       1.18
## alpha[16]       1.05       1.16
## alpha[17]       1.06       1.17
## alpha[18]       1.05       1.15
## alpha[19]       1.05       1.15
## alpha[20]       1.06       1.19
## alpha[21]       1.05       1.14
## alpha[22]       1.05       1.15
## alpha[23]       1.04       1.13
## alpha[24]       1.07       1.20
## alpha[25]       1.06       1.18
## alpha[26]       1.05       1.15
## alpha[27]       1.05       1.16
## alpha[28]       1.06       1.17
## alpha[29]       1.05       1.16
## alpha[30]       1.06       1.17
## alpha[31]       1.05       1.16
## alpha[32]       1.05       1.14
## alpha[33]       1.06       1.17
## alpha[34]       1.05       1.16
## alpha[35]       1.05       1.15
## alpha[36]       1.05       1.15
## alpha[37]       1.05       1.15
## alpha[38]       1.05       1.16
## alpha[39]       1.05       1.15
## alpha[40]       1.05       1.14
## alpha[41]       1.05       1.15
## alpha[42]       1.05       1.16
## alpha[43]       1.05       1.17
## alpha[44]       1.05       1.16
## alpha[45]       1.06       1.17
## alpha[46]       1.05       1.16
## alpha[47]       1.05       1.15
## alpha[48]       1.06       1.18
## alpha[49]       1.06       1.17
## alpha[50]       1.06       1.18
## alpha[51]       1.04       1.11
## alpha[52]       1.05       1.15
## alpha[53]       1.06       1.17
## alpha[54]       1.05       1.14
## alpha[55]       1.04       1.13
## alpha[56]       1.05       1.16
## alpha[57]       1.06       1.18
## alpha[58]       1.06       1.17
## alpha[59]       1.05       1.16
## alpha[60]       1.06       1.17
## alpha[61]       1.05       1.14
## alpha[62]       1.05       1.15
## alpha[63]       1.06       1.18
## b[1]            1.02       1.06
## b[2]            1.02       1.07
## b[3]            1.05       1.14
## b[4]            1.01       1.03
## mu              1.09       1.24
## tau             1.00       1.01
## 
## Multivariate psrf
## 
## 1.06
autocorr.diag(mod2_quiz_sim)
##         alpha[1]  alpha[2]  alpha[3]  alpha[4]  alpha[5]  alpha[6]  alpha[7]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8130223 0.8293075 0.8115851 0.8696703 0.8425437 0.7483259 0.8349653
## Lag 5  0.7450226 0.7572144 0.7229225 0.8092091 0.7767573 0.6553023 0.7699544
## Lag 10 0.7234832 0.7335872 0.6962484 0.7919503 0.7561904 0.6329739 0.7503647
## Lag 50 0.5968197 0.6047085 0.5686419 0.6514262 0.6161827 0.5117982 0.6163542
##         alpha[8]  alpha[9] alpha[10] alpha[11] alpha[12] alpha[13] alpha[14]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8176261 0.7630118 0.8097604 0.8445401 0.7454887 0.7541190 0.7757786
## Lag 5  0.7439951 0.6678375 0.7335642 0.7736074 0.6587334 0.6650232 0.6917111
## Lag 10 0.7229459 0.6452407 0.7089162 0.7548059 0.6336756 0.6438805 0.6745015
## Lag 50 0.5894430 0.5225818 0.5815798 0.6155115 0.5272547 0.5316267 0.5610539
##        alpha[15] alpha[16] alpha[17] alpha[18] alpha[19] alpha[20] alpha[21]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8093912 0.7696902 0.7609933 0.8237798 0.7533637 0.8414174 0.7756744
## Lag 5  0.7436602 0.6752044 0.6804946 0.7423237 0.6568556 0.7725687 0.6920266
## Lag 10 0.7238134 0.6636006 0.6681797 0.7171647 0.6418287 0.7514398 0.6703877
## Lag 50 0.5927396 0.5445512 0.5458944 0.5741883 0.5234741 0.6216608 0.5450601
##        alpha[22] alpha[23] alpha[24] alpha[25] alpha[26] alpha[27] alpha[28]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8522713 0.7453876 0.8597214 0.7943771 0.7573685 0.7641177 0.7945568
## Lag 5  0.7808332 0.6468332 0.7953362 0.7183538 0.6650998 0.6671481 0.7141472
## Lag 10 0.7537552 0.6179759 0.7730019 0.7011015 0.6388640 0.6529467 0.6975519
## Lag 50 0.6066910 0.5097635 0.6329831 0.5859579 0.5241854 0.5448170 0.5756044
##        alpha[29] alpha[30] alpha[31] alpha[32] alpha[33] alpha[34] alpha[35]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.7612360 0.7775038 0.7635490 0.7688619 0.7678029 0.7728274 0.7606810
## Lag 5  0.6687761 0.6896979 0.6636890 0.6584833 0.6849511 0.6856613 0.6690782
## Lag 10 0.6493745 0.6738261 0.6485684 0.6310441 0.6575111 0.6647028 0.6472003
## Lag 50 0.5499544 0.5536014 0.5267862 0.5220846 0.5474053 0.5576233 0.5295319
##        alpha[36] alpha[37] alpha[38] alpha[39] alpha[40] alpha[41] alpha[42]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.7457680 0.7645536 0.7639567 0.7585865 0.7331716 0.7427533 0.7637745
## Lag 5  0.6558295 0.6684527 0.6751593 0.6701945 0.6300092 0.6227290 0.6852726
## Lag 10 0.6314418 0.6481560 0.6568250 0.6559775 0.6076281 0.6096812 0.6683980
## Lag 50 0.5112425 0.5355588 0.5476313 0.5396308 0.4986539 0.4948344 0.5509429
##        alpha[43] alpha[44] alpha[45] alpha[46] alpha[47] alpha[48] alpha[49]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.7586945 0.7569620 0.7584169 0.7490411 0.7512274 0.7505086 0.7643437
## Lag 5  0.6852788 0.6703085 0.6687643 0.6602383 0.6663659 0.6655819 0.6731325
## Lag 10 0.6553208 0.6527768 0.6554543 0.6353327 0.6478861 0.6510697 0.6533007
## Lag 50 0.5529633 0.5394169 0.5404847 0.5285744 0.5276167 0.5335050 0.5437481
##        alpha[50] alpha[51] alpha[52] alpha[53] alpha[54] alpha[55] alpha[56]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.7579684 0.7677906 0.7341163 0.7710255 0.7556683 0.7272354 0.7587437
## Lag 5  0.6735961 0.6822070 0.6435516 0.6823084 0.6595049 0.6093638 0.6764942
## Lag 10 0.6495666 0.6579393 0.6236359 0.6636353 0.6410143 0.5968300 0.6532450
## Lag 50 0.5396658 0.5411084 0.5161987 0.5446058 0.5313454 0.4808428 0.5415317
##        alpha[57] alpha[58] alpha[59] alpha[60] alpha[61] alpha[62] alpha[63]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8020383 0.8045846 0.7477918 0.7783592 0.7655815 0.7413319 0.7889189
## Lag 5  0.7256209 0.7291158 0.6551824 0.6985613 0.6834527 0.6536593 0.7017324
## Lag 10 0.7022183 0.7096775 0.6418935 0.6760613 0.6636691 0.6366767 0.6874389
## Lag 50 0.5790499 0.5827975 0.5317159 0.5590306 0.5396007 0.5248642 0.5621312
##             b[1]       b[2]      b[3]       b[4]        mu        tau
## Lag 0  1.0000000 1.00000000 1.0000000 1.00000000 1.0000000 1.00000000
## Lag 1  0.9482476 0.89026813 0.9818307 0.51825322 0.9888970 0.67910525
## Lag 5  0.7819145 0.57526296 0.9179675 0.06987674 0.9691418 0.29785465
## Lag 10 0.6367783 0.36579380 0.8535790 0.05097349 0.9452129 0.11528200
## Lag 50 0.2282313 0.04818833 0.5738522 0.05615796 0.7795253 0.01451548
autocorr.plot(mod2_quiz_sim)

effectiveSize(mod2_quiz_sim)
##   alpha[1]   alpha[2]   alpha[3]   alpha[4]   alpha[5]   alpha[6]   alpha[7] 
##   69.93522   58.39308   66.07708   57.28802   55.77079   88.35612   60.52119 
##   alpha[8]   alpha[9]  alpha[10]  alpha[11]  alpha[12]  alpha[13]  alpha[14] 
##   57.27397   72.79412   70.52522   66.15330   74.92978   68.80849   60.14961 
##  alpha[15]  alpha[16]  alpha[17]  alpha[18]  alpha[19]  alpha[20]  alpha[21] 
##   66.03448   67.90636   62.74324   84.51870   62.55531   53.57929   70.63159 
##  alpha[22]  alpha[23]  alpha[24]  alpha[25]  alpha[26]  alpha[27]  alpha[28] 
##   76.61090   84.83912   57.03543   58.82517   73.01397   64.36792   62.54543 
##  alpha[29]  alpha[30]  alpha[31]  alpha[32]  alpha[33]  alpha[34]  alpha[35] 
##   72.53209   69.61810   76.06413   74.63763   66.47699   58.04596   70.30638 
##  alpha[36]  alpha[37]  alpha[38]  alpha[39]  alpha[40]  alpha[41]  alpha[42] 
##   75.55322   71.58101   61.17944   73.98547   73.01536   91.61746   66.81120 
##  alpha[43]  alpha[44]  alpha[45]  alpha[46]  alpha[47]  alpha[48]  alpha[49] 
##   67.95205   65.66096   70.75009   67.49881   67.10684   66.50257   79.04246 
##  alpha[50]  alpha[51]  alpha[52]  alpha[53]  alpha[54]  alpha[55]  alpha[56] 
##   63.40946   64.76242   66.67231   63.17561   68.87007   92.56175   67.68889 
##  alpha[57]  alpha[58]  alpha[59]  alpha[60]  alpha[61]  alpha[62]  alpha[63] 
##   58.49702   60.17151   68.06615   67.89447   73.05670   67.94268   64.95343 
##       b[1]       b[2]       b[3]       b[4]         mu        tau 
##  345.03949  799.64207  113.65802 2669.84486   37.66104 1872.65492
## compute DIC
dic = dic.samples(mod2_quiz, n.iter=1e3)
dic
## Mean deviance:  1240 
## penalty 28.07 
## Penalized deviance: 1268